Revision control

Example history tree of a revision-controlled project.

Revision control, also known as version control, source control or software configuration management (SCM), is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files. Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged.

Version control systems (VCSes - singular VCS) most commonly run as stand-alone applications, but revision control is also embedded in various types of software like word processors (e.g., OpenOffice.org Writer, Microsoft Word, KWord, Pages, etc.), spreadsheets (e.g., OpenOffice.org Calc, Microsoft Excel, KSpread, Numbers, etc.), and in various content management systems (e.g., Drupal, Joomla, WordPress). Integrated revision control is a key feature of wiki software packages such as MediaWiki, DokuWiki, TWiki etc. In wikis, revision control allows for the ability to revert a page to a previous revision, which is critical for allowing editors to track each other's edits, correct mistakes, and defend public wikis against vandalism and spam.

Software tools for revision control are essential for the organization of multi-developer projects.[1]

Contents

Overview

Engineering revision control developed from formalized processes based on tracking revisions of early blueprints or bluelines. This system of control implicitly allowed returning to any earlier state of the design, for cases in which an engineering dead-end was reached in the development of the design. Likewise, in computer software engineering, revision control is any practice that tracks and provides control over changes to source code. Software developers sometimes use revision control software to maintain documentation and configuration files as well as source code. Also, version control is widespread in business and law. Indeed, "contract redline" and "legal blackline" are some of the earliest forms of revision control, and are still employed with varying degrees of sophistication. An entire industry has emerged to service the document revision control needs of business and other users, and some of the revision control technology employed in these circles is subtle, powerful, and innovative. The most sophisticated techniques are beginning to be used for the electronic tracking of changes to CAD files (see product data management), supplanting the "manual" electronic implementation of traditional revision control.

As teams design, develop and deploy software, it is common for multiple versions of the same software to be deployed in different sites and for the software's developers to be working simultaneously on updates. Bugs and other features of the software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program develops). Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, where one version has bugs fixed, but no new features (branch), while the other version is where new features are worked on (trunk).

At the simplest level, developers could simply retain multiple copies of the different versions of the program, and label them appropriately. This simple approach has been used on many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers, and often leads to mistakes. Consequently, systems to automate some or all of the revision control process have been developed.

Moreover, in software development, legal and business practice and other environments, it has become increasingly common for a single document or snippet of code to be edited by a team, the members of which may be geographically dispersed and may pursue different and even contrary interests. Sophisticated revision control that tracks and accounts for ownership of changes to documents and code may be extremely helpful or even necessary in such situations.

Revision control may also track changes to configuration files, such as those typically stored in /etc or /usr/local/etc on Unix systems. This gives system administrators another way to easily track changes made and a way to roll back to earlier versions should the need arise.

Source-management models

Traditional revision control systems use a centralized model where all the revision control functions take place on a shared server. If two developers try to change the same file at the same time, without some method of managing access the developers may end up overwriting each other's work. Centralized revision control systems solve this problem in one of two different "source management models": file locking and version merging.

Atomic operations

Computer scientists speak of atomic operations if the system is left in a consistent state even if the operation is interrupted. The commit operation is usually the most critical in this sense. Commits are operations which tell the revision control system you want to make a group of changes you have been making final and available to all users. Not all revision control systems have atomic commits; notably, the widely-used CVS lacks this feature.

File locking

The simplest method of preventing "concurrent access" problems involves locking files so that only one developer at a time has write access to the central "repository" copies of those files. Once one developer "checks out" a file, others can read that file, but no one else may change that file until that developer "checks in" the updated version (or cancels the checkout).

File locking has both merits and drawbacks. It can provide some protection against difficult merge conflicts when a user is making radical changes to many sections of a large file (or group of files). However, if the files are left exclusively locked for too long, other developers may be tempted to bypass the revision control software and change the files locally, leading to more serious problems.

Version merging

Most version control systems allow multiple developers to edit the same file at the same time. The first developer to "check in" changes to the central repository always succeeds. The system may provide facilities to merge further changes into the central repository, and preserve the changes from the first developer when other developers check in.

Merging two files can be a very delicate operation, and usually possible only if the data structure is simple, as in text files. The result of a merge of two image files might not result in an image file at all. The second developer checking in code will need to take care with the merge, to make sure that the changes are compatible and that the merge operation does not introduce its own logic errors within the files. These problems limit the availability of automatic or semi-automatic merge operations mainly to simple text based documents, unless a specific merge plugin is available for the file types.

The concept of a reserved edit can provide an optional means to explicitly lock a file for exclusive write access, even when a merging capability exists.

Distributed revision control

Distributed revision control (DRCS) takes a peer-to-peer approach, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository.[2] Distributed revision control conducts synchronization by exchanging patches (change-sets) from peer to peer. This results in some important differences from a centralized system:

Rather, communication is only necessary when pushing or pulling changes to or from other peers.

Open systems

An "open system" of distributed revision control is characterized by its support for independent branches, and its heavy reliance on merge operations. Its general characteristics include:

One of the first open systems, BitKeeper, served in the development of the Linux kernel. When the makers of BitKeeper decided in 2005 to restrict its licensing,[4] Linux developers looked for a free replacement.

As of 2010, common open systems in free use include:

For a full list, see the comparison of revision control software.

Replicated systems

A replicated system of distributed revision control depends on a replicated database. A check-in is equivalent to a distributed commit. Successful commits create a single baseline, which reduces the need for merges. An example of a replicated distributed system is Code Co-op.

Integration

Some of the more advanced revision-control tools offer many other facilities, allowing deeper integration with other tools and software-engineering processes. Plugins are often available for IDEs such as Oracle JDeveloper, IntelliJ IDEA, Eclipse and Visual Studio. NetBeans IDE and Xcode come with integrated version control support.

Common vocabulary

Terminology can vary from system to system, but some terms in common usage include:[8][9]

Baseline 
An approved revision of a document or source file from which subsequent changes can be made. See the discussion of baselines, labels, and tags (below).
Branch 
A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may develop at different speeds or in different ways independently of each other.
Change 
A change (or diff, or delta) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.
Change list 
On many version control systems with atomic multi-change commits, a changelist, change set, or patch identifies the set of changes made in a single commit. This can also represent a sequential view of the source code, allowing the examination of source "as of" any particular changelist ID.
Checkout 
A check-out (or co) creates a local working copy from the repository. A user may specify a specific revision or obtain the latest.
Commit 
A commit (checkin, ci or, more rarely, install, submit or record) occurs when writing or merging a copy of the changes made to the working copy into the repository.
Conflict 
A conflict occurs when different parties make changes to the same document, and the system is unable to reconcile the changes. A user must resolve the conflict by combining the changes, or by selecting one change in favour of the other.
Delta compression 
Most revision control software uses delta compression, which retains only the differences between successive versions of files. This allows for more efficient storage of many different versions of files.
Dynamic stream 
A stream in which some or all file versions are mirrors of the parent stream's versions.
Export 
An export resembles a check-out except that it creates a clean directory tree without the version-control metadata used in a working copy. Often used prior to publishing the contents.
Head 
The most recent commit.
Import 
An import involves copying a local directory tree (that is not currently a working copy) into the repository for the first time.
Label 
See tag.
Mainline 
Similar to trunk, but there can be a mainline for each branch.
Merge 
A merge or integration is an operation in which two sets of changes are applied to a file or set of files. Some sample scenarios are as follows:
  • A user, working on a set of files, updates or syncs their working copy with changes made, and checked into the repository, by other users.[10]
  • A user tries to check-in files that have been updated by others since the files were checked out, and the revision control software automatically merges the files (typically, after prompting the user if it should proceed with the automatic merge, and in some cases only doing so if the merge can be clearly and reasonably resolved).
  • A set of files is branched, a problem that existed before the branching is fixed in one branch, and the fix is then merged into the other branch.
  • A branch is created, the code in the files is independently edited, and the updated branch is later incorporated into a single, unified trunk.
Promote 
The act of copying file content from a less controlled location into a more controlled location. For example, from a user's workspace into a repository, or from a stream to its parent.[11]
Repository 
The repository is where files' current and historical data are stored, often on a server. Sometimes also called a depot (for example, by SVK, AccuRev and Perforce).
Resolve 
The act of user intervention to address a conflict between different changes to the same document.
Reverse integration 
The process of merging different team branches into the main trunk of the versioning system.
Revision 
Also version: A version is any change in form. In SVK, a Revision is the state at a point in time of the entire tree in the repository.
Ring
See tag.
Share
The act of making one file or folder available in multiple branches at the same time. When a shared file is changed in one branch, it is changed in other branches.
Stream 
A container for branched files that has a known relationship to other such containers. Streams form a hierarchy; each stream can inherit various properties (like versions, namespace, workflow rules, subscribers, etc.) from its parent stream.
Tag 
A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. See the discussion of baselines, labels, and tags (below).
Trunk
The unique line of development that is not a branch (sometimes also called Baseline or Mainline)
Update 
An update (or sync) merges changes made in the repository (by other people, for example) into the local working copy.[10]
Working copy
The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox.

Baselines, labels, and tags

Most often only one of the terms baseline, label, or tag is used in documentation or discussion and can be considered synonyms. Most revision control tools will use only one of these similar terms (baseline, label, tag) to refer to the action of identifying a snapshot ("label the project") or the record of the snapshot ("try it with baseline X"). However, in most projects some snapshots are more significant than others, such as those used to indicate published releases, branches, or milestones.

When both the term baseline and either of label or tag are used together in the same context, label and tag usually refer to the mechanism within the tool of identifying or making the record of the snapshot, and baseline indicates the increased significance of any given label or tag.

Most formal discussion of configuration management uses the term baseline.

See also

References

  1. "Rapid Subversion Adoption Validates Enterprise Readiness and Challenges Traditional Software Configuration Management Leaders". EETimes. May 17, 2007. http://www.eetimes.com/press_releases/bizwire/showPressRelease.jhtml?articleID=608063&CompanyId=2. Retrieved June 1, 2007. "Version management is essential to software development and is considered the most critical component of any development environment." 
  2. Wheeler, David. "Comments on Open Source Software / Free Software (OSS/FS) Software Configuration Management (SCM) Systems". http://www.dwheeler.com/essays/scm.html. Retrieved May 8, 2007. 
  3. 3.0 3.1 O'Sullivan, Bryan. "Distributed revision control with Mercurial". http://hgbook.red-bean.com/hgbook.html. Retrieved July 13, 2007. 
  4. "Bitmover ends free Bitkeeper, replacement sought for managing Linux kernel code". Wikinews. April 7, 2005. http://en.wikinews.org/wiki/Bitmover_ends_free_Bitkeeper%2C_replacement_sought_for_managing_Linux_kernel_code. 
  5. "Ubuntu in Launchpad". Canonical Ltd. http://launchpad.net/ubuntu. Retrieved 2008-10-21. 
  6. Arnö, Kaj (2008-06-19). "Version Control: Thanks, BitKeeper - Welcome, Bazaar". http://blogs.mysql.com/kaj/2008/06/19/version-control-thanks-bitkeeper-welcome-bazaar/. Retrieved 2008-06-19. 
  7. "Getting Started/Sources/Amarok Git Tutorial". KDE TechBase. KDE. August 25, 2009. http://techbase.kde.org/Getting_Started/Sources/KDE_git-tutorial. Retrieved October 9, 2009. "Amarok is now developed in a Git repository instead of SVN. This was done to help get into place all the needed infrastructure to convert all of KDE, including documentation." 
  8. Collins-Sussman, Ben; Fitzpatrick, B.W. and Pilato, C.M. (2004). Version Control with Subversion. O'Reilly. ISBN 0-596-00448-6. http://svnbook.red-bean.com/. 
  9. Wingerd, Laura (2005). Practical Perforce. O'Reilly. ISBN 0-596-10185-6. http://safari.oreilly.com/0596101856. 
  10. 10.0 10.1 Collins-Sussman, Ben; Brian W. Fitpatrick, and C. Michael Pilato. "Version Control with Subversion". http://svnbook.red-bean.com/en/1.5/svn.tour.cycle.html#svn.tour.cycle.resolve. Retrieved 8 June 2010. "The G stands for merGed, which means that the file had local changes to begin with, but the changes coming from the repository didn't overlap with the local changes." 
  11. Accurev Concepts Manual, Version 4.7. Accurev, Inc.. July, 2008. 

External links